大规模矢量映射对于运输,城市规划,调查和人口普查很重要。我们提出了GraphMapper,这是从卫星图像中提取端到端向量图的统一框架。我们的关键思想是一种新颖的统一表示,称为“原始图”的不同拓扑的形状,这是一组形状原语及其成对关系矩阵。然后,我们将向量形状的预测,正则化和拓扑重构转换为独特的原始图学习问题。具体而言,GraphMapper是一个基于多头注意的全局形状上下文建模的通用原始图形学习网络。开发了一种嵌入式空间排序方法,用于准确的原始关系建模。我们从经验上证明了GraphMapper对两个具有挑战性的映射任务的有效性,即建立足迹正则化和道路网络拓扑重建。我们的模型在公共基准上的两项任务中都优于最先进的方法。所有代码将公开可用。
translated by 谷歌翻译
我们展示了MVLayoutNet,是来自多视图全景的整体三维重建端到端网络。我们的核心贡献是无缝地将学习的单目布局估计和多视图立体声(MV)结合起来,以便在3D和图像空间中准确地重建。我们共同列出布局模块以产生初始布局和新型MVS模块,以获得精确的布局几何形状。与标准MVSNET [33]不同,我们的MVS模块采用新建的布局成本卷,其在相同的深度层中聚合到相应的布局元件中的多视图成本。我们还提供了一种基于注意的方案,指导MVS模块专注于结构区域。这种设计考虑了本地像素级成本和全球整体信息,以便更好地重建。实验表明,我们的方法在2D-3D-S [1]和Zind [5]数据集中,在深度RMSE方面以21.7%和20.6%表示最先进的。最后,我们的方法导致连贯的布局几何,使整个场景的重建能够。
translated by 谷歌翻译
本文侧重于NYSTR \“{o} M正常化的学习速率分析,为$ \ tau $ -mixing时间序列使用顺序子采样。使用最近开发的Banach-valueed Bernstein不等式以\ tau $ -mixing序列和一个基于二阶分解的积分操作方法,我们成功地推出了NYSTR \“{o} M正常化的最佳学习率,以及用于$ \ TAU $ -MIXING时间序列的顺序子采样。进行了一系列数值实验以验证我们的理论结果,显示NYSTR \“{o} M正则化的优异学习性能,在学习大规模时间序列数据中具有顺序子采样。所有这些结果都扩展了NYSTR \适用范围“{o} M正常化从IID对非i.i.d的样品。序列。
translated by 谷歌翻译
视觉预读(VLP)模型最近成功地促进了许多跨模式下游任务。大多数现有作品通过比较微调的下游任务性能来评估其系统。但是,只有平均下游任务准确性才能提供有关每种VLP方法的优缺点的几乎没有信息,更不用说有关社区如何改善系统的见解。受清单进行自然语言处理的启发,我们引入了VL-CheckList,这是一个新颖的框架,以了解VLP模型的功能。所提出的方法将VLP模型的图像定位能力分为三类:对象,属性和关系,并使用新颖的分类法进一步分解这三个方面。我们进行了全面的研究,通过提出的框架分析了七个最近流行的VLP模型。结果通过揭示了仅在下游任务评估中看不见的模型之间的细粒度差异来证实所提出的方法的有效性。进一步的结果表明,在构建更好的VLP模型方面有希望的研究方向。数据和代码:https://github.com/om--ai-lab/vl-checklist
translated by 谷歌翻译
我们提出了一种新颖的软件服务推荐模型,以帮助用户在Github中找到合适的存储库。我们的模型首先设计了一种新颖的上下文诱导的存储库图嵌入方法,以利用存储库的丰富上下文信息来缓解数据稀疏问题引起的困难。然后,它在软件服务推荐字段中首次利用用户存储库交互的序列信息。具体地,采用基于深度学习的顺序推荐技术来捕获用户偏好的动态。在从Github收集的大型数据集中进行了综合实验,以根据现有方法列表。结果说明了我们在各个方面的方法的优越性。
translated by 谷歌翻译
投影技术经常用于可视化高维数据,使用户能够更好地理解在2D屏幕上的多维空间的总体结构。尽管存在着许多这样的方法,相当小的工作已经逆投影的普及方法来完成 - 绘制投影点,或者更一般的过程中,投影空间回到原来的高维空间。在本文中我们提出NNInv,用近似的任何突起或映射的逆的能力的深学习技术。 NNInv学会重建上的二维投影空间从任意点高维数据,给用户在视觉分析系统所学习的高维表示的能力进行交互。我们提供NNInv的参数空间的分析,并在选择这些参数提供指导。我们通过一系列定量和定性分析的延长NNInv的有效性验证。交互式实例中插值,分级协议,梯度可视化:然后,我们把它应用到三个可视化任务,验证了该方法的效用。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译